Informing the development of an outcome set and banks of items to measure mobility among individuals with acquired brain injury using natural language processing

Background The sheer number of measures evaluating mobility and inconsistencies in terminology make it challenging to extract potential core domains and items. Automating a portion of the data synthesis would allow us to cover a much larger volume of studies and databases in a smaller fraction of the time compared to the usual process. Thus, the objective of this study was to identify a comprehensive outcome set and develop preliminary banks of items of mobility among individuals with acquired brain injury (ABI) using Natural Language Processing (NLP). Methods An umbrella review of 47 reviews evaluating the content of mobility measures among individuals with ABI was conducted. A search was performed on 5 databases between 2000 and 2020. Two independent reviewers retrieved copies of the measures and extracted mobility domains and items. A pre-trained BERT model (state-of-the-art model for NLP) provided vector representations for each sentence. Using the International Classification of Functioning, Disability, and Health Framework (ICF) ontology as a guide for clustering, a k-means algorithm was used to retrieve clusters of similar sentences from their embeddings. The resulting embedding clusters were evaluated using the Silhouette score and fine-tuned according to expert input. Results The study identified 246 mobility measures, including 474 domains and 2109 items. Encoding the clusters using the ICF ontology and expert knowledge helped in regrouping the items in a way that is more closely related to mobility terminology. Our best results identified banks of items that were used to create a 24 comprehensive outcome sets of mobility, including Upper Extremity Mobility, Emotional Function, Balance, Motor Control, Self-care, Social Life and Relationships, Cognition, Walking, Postural Transition, Recreation, and Leisure Activities, Activities of Daily Living, Physical Functioning, Communication, Work/Study, Climbing, Sensory Functions, General Health, Fatigue, Functional Independence, Pain, Alcohol and Drugs Use, Transportation, Sleeping, and Finances. Conclusion The banks of items of mobility domains represent a first step toward establishing a comprehensive outcome set and a common language of mobility to develop the ontology. It enables researchers and healthcare professionals to begin exposing the content of mobility measures as a way to assess mobility comprehensively. Supplementary Information The online version contains supplementary material available at 10.1186/s12883-022-02938-1.


Background
Acquired Brain Injury (ABI), including traumatic brain injury (TBI) and stroke, is most prevalent cause of disability globally [1][2][3]. According to the World Health Organization, the global incidence of all-severity TBI is estimated at 69 million people, while 15 million people suffer a stroke worldwide each year [4][5][6]. Among the 1.5 million Canadians with ABI that go through the care continuum annually; over 60% report ongoing restrictions in mobility and participation in societal roles [5]. Individuals with ABI can continue to experience improvements in mobility to improve participation and well-being when rehabilitation intervention can be offered in the community. However, the often the accessibility to the rehabilitation pathway is complex and time-consuming [7][8][9]. Thus, the effect on individuals, health care systems, and society suggest a greater need to focus attention on the long-term consequences, management, and rehabilitation of people with ABI [10].
Mobility is a multidimensional construct defined through both theoretical and empirical approaches. From a theoretical point of view, mobility has frequently been defined in terms of life-space frameworks as the ability to move oneself, including any age, within environments that expand from one's home to the neighbourhood and regions beyond [11][12][13][14][15][16][17][18]. Mobility is influenced by five vital inter-related determinants, including physical, environmental, cognitive, psychosocial and financial influences [14], and this is reflected in the International Classification, Functioning, Disability, and Health framework (ICF) core set [19]. Empirical studies have also focused on the effects of the built environment including technological parts, such as mobility aids, on community mobility [20,21].
Selection of a suitable outcome measure is critical to accurately characterize and monitor changes in mobility during rehabilitation interventions for adults with ABI [22]. However, selection can pose a challenge to both researchers and clinicians as the range of outcome measures available in the clinical research literature is vast, and distinctions between them are often not clear [23,24]. Researchers and clinicians also need to consider the content of measures and whether the domains evaluated match research and clinical objectives. Multifaceted assessments of mobility among individuals with ABI can assist in the development of individualized rehabilitation treatment plans that could enhance patients' global health status and allow the evaluation of the long-term effectiveness of interventions [25,26].
Mobility is commonly assessed through performance-based measures (e.g., walking tests) or clinicianreported outcomes (e.g., Disability Rating Scale) [27][28][29]. Although these measures capture some aspects of functional capacity, they are not comprehensive enough to evaluate patients' perspective on their function, nor the effects of their limitations on everyday life. In the last 20 years, advances in measurements have brought to the research and clinical practice the assessment of quality of life through patient-reported outcome (PRO) measures [30,31]. Mainly, the National Institutes of Health's Patient-Reported Outcomes Measurement Information System (PROMIS), the Quality of Life in Neurologic Disorders (Neuro-QoL) and the Traumatic Brain Injury Quality of Life (TBI-QOL) initiatives have pioneered the development of PRO measures [30][31][32][33]. These initiatives have resulted in the development of measures that allow comparison across conditions over time, testing of all levels of function with one measure, reduce the administration of irrelevant items to a given individual, and minimize testing time by reducing the overall number of items administered through short forms [26,32,33]. Although these initiatives have made great advances in general population and neurological population assessment, neither measurement system alone can capture the multi-dimensionality of mobility among individuals with ABI.
Core Outcome Sets (COS) developed by researchers and patients allow interventions to be evaluated by using an agreed-upon set of outcomes that can be compared across studies, and clinical care programs and settings. A COS includes measures, tools, and endpoints to assess a minimum list of impacts and demonstrate changes. The PROMIS (www. nihpr omis. org, March 16, 2021) is charged with developing improved PROs applicable to all areas of chronic illness and involving several domains such as physical functioning and disability. PROMIS is the most ambitious approach yet to these issues [34][35][36]. In simplest terms, PROMIS seeks to employ the best items in the best ways [34][35][36] with a focus on items that are most relevant to study endpoints in clinical trials and observational studies. Optimal instrument development requires item improvement, yet systematic approaches to the advancement of improved items need to ensure items have full coverage of the construct of interest, and adjust item banks; if data supports that a given item is problematic, it is removed or revised to increase its relevance and clarity.
Compared to traditional manual consensus, utilizing machine learning (ML) helps researchers to develop item banks more efficiently and synthesize literature that manually is nearly impossible. ML is a subset of Artificial Intelligence that enables computers to learn without being explicitly programmed with predefined rules [37]. In the rehabilitation sciences, building computer programs that can extract and process knowledge from text documents at a level that is usable by experts in the domain requires several elements that can generally be associated with intelligence [37,38]. This predictive ability enables ML to handle massive datasets with efficiency and accuracy. ML algorithms are categorized into supervised learning, unsupervised learning, and reinforcement learning [39]. Natural language processing (NLP) is unsupervised ML that focuses particularly on textual data/ info/input [40]. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable [40]. For example, a key feature of NLP is to generate embeddings for extents of text [41]. Text embeddings can be used to ease learning in downstream tasks and naturally encode similarity whether it is on the word-level or sentence-level [42].
Properly classifying content from mobility measures is needed to identify relevant texts. Often, this process relies on pre-defined static vocabularies that describe the mobility domains. To understand knowledge evolution, the initial system vocabularies should evolve in an automatic way in order to correctly reflect and evolve our understanding about mobility. Our goals for this work were to identify optimal domains by extracting and classifying items from published research of mobility measures. We did this using NLP technique to create sentence embeddings to inform the mobility ontology. NLP was selected as an approach robust enough to develop preliminary banks of items of mobility that used to evaluate each domain in a comprehensive outcome set of mobility among individuals with ABI.

Objective
While using NLP, we aimed to: (1) identify a comprehensive outcome set of mobility, and (2) develop preliminary banks of items of mobility among individuals with ABI.

Step 1: Item selection process
To develop preliminary banks of items of mobility among individuals with ABI, we conducted a comprehensive umbrella review of mobility measures among individuals with ABI [43] following the 10 steps of the Consensus-based Standards for the Selection of Health Measurement Instrument (COSMIN) guideline for systematic reviews [44]. Subsequently, we conducted focus group discussions among clinicians and individuals with ABI and their caregivers to identify factors limiting or enhancing mobility that need to be considered when evaluating mobility [45].
1.1. Search strategy: A comprehensive search of the literature was performed using electronic databases of Ovid MEDLINE, CINHAL, Cochrane Library and EMBASE from 2000 to March 2020. The search was conducted in collaboration with a health sciences librarian to ensure that the review included the appropriate and necessary keywords. A combination of Medical Subject Headings (MeSH) terms, subject headings and/or key words was used. Three groups of terms were generated describing: (1) the population "acquired brain injury" AND; (2) the outcome measure "mobility" AND; (3) the psychometric properties. Terms within each group were combined with the Boolean operator 'OR' . Because the search included different types of studies, the search was narrowed by filtering the search specifying the type of studies including systematic review, review, and meta-analyses. This filter has been used to avoid missing important information related to mobility measures. 1.2. Select abstracts and full text articles: Inclusion of articles was based on the agreement between two independent reviewers. Disagreements were resolved by discussion and consensus. If required, a third reviewer was consulted. The reference list of the articles included for the full text screening was also hand-searched for additional identification of relevant articles. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram [46] was used to guide the selection process. 1.3. Eligibility criteria: Inclusion criteria for the umbrella review were reviews published in peerreviewed journals, including individuals with ABI (Stroke, traumatic brain injury) over 18 years old. They report a clear objective of identifying measures of mobility. They include either multiple or single measure(s) of mobility including different sources of information (i.e., clinicians, patients, and technology). The exclusion criteria were reviews investigating effectiveness of interventions or treatments, monitoring recovery, focusing on diagnostic screening or prognosis, clinical commentaries, case reports, non-human studies and grey literature. Also, systematic reviews not published in English or French were excluded.
1.4. Data extraction: Two independent reviewers extracted the measures from the reviews, retrieved copies of measures from the literature, and included the non-covered measures identified from the focus groups. They extracted measures' domains and items manually, to avoid missing relevant information. Also, they added mobility domains (i.e. factors) identified from the focus groups.
Step 2: Data cleaning The data cleaning process ensures that the domains and items are consistent and accurate. The following steps were applied to the processed terms using Microsoft office Excel 2010 (Additional file 1: Appendix 1 presents the functionalities that used in this process): converted to lowercases letters. 2.5. Extend acronyms and abbreviations to their full form: because they caused mismatches in the string-matching process, acronyms and abbreviations were removed, such as 6MWT becomes sixminute walking test, BI becomes brain injury, and so on. 2.6. Fixing numbers and number signs. 2.7. Remove white spaces, non-printing characters, typos, punctuations from the sentence, and use underscore (_) instead of dash (-).
Step 3: The proposed model Figure 1 presents an overview of the proposed model that was used to analyze the data using the NLP technique. Python 3.0 Release was used to analyse the data. All the process details are described below: 3.1. For each mobility item, we first applied a word filtering that was hypothesized to remove noise from the word groups. The different filters considered were: the absence of filter; filtering all words with fewer than 4 letters; filtering words contained in a public stop-words dictionary; and filtering words based on their occurrence, where words seen too often in the dataset were removed from their group. 3 The ICF terms went through the same pipeline of word filtering, Sentence-BERT and dimensionality reduction. 3.6. The k-means algorithm [52] was applied to all collected sentence embeddings to retrieve clusters of similar sentences. 3.7. To evaluate the quality of the resulting clusters, a Silhouette score [53,54] was used. A Silhouette score is a clustering metric ranging between -1 to 1, and based on inter-and intra-cluster distances. A high Silhouette score means that sentences in a given cluster are similar and that different clusters are distinct. A Silhouette score can be used in our case, but evaluating the quality of the model was limited in terms of sentence embeddings, as the vectorial distance between sentences in one cluster were not well fitted to mobility-related proximity. Therefore, we used the Silhouette score to filter out promising clusterings and relied on expert input to select the final clustering. 3.8. We employ a grid search strategy to generate numerous clusterings from a range of key hyperparameters in our method. Namely, we searched over the following hyperparameter values: 1. k value in k-means, ranging from 4 to 40; 2. four (4)  We generate a clustering for every combination (n=592) of the above hyperparameter values. We retain only the 10 best clusterings according to an automatic heuristic, described in section 4. An expert then goes over the 10 retained clusterings and selects the most relevant one for further analysis. We argue that this twostep procedure is required by the intrinsic difficulty of the clustering evaluation task. Indeed, while the automatic heuristic filter first eliminates clusterings that only weakly correlated, i.e., underfitting, the expert decision at the end detects clusterings that have good correlation metrics but low relevance with the overall objective, i.e., overfitting, Underfitting and overfitting commonly arise in unsupervised settings such as ours due to the lack of ground-truth labels to assess the true performance of the model. ing: remove ambiguous, vague and parallel items; clarify items by adding or removing needed words; and label each item to an agreed-upon domain. The expert annotations were then used to fine-tune the Sentence-BERT model towards more meaningful mobility-related sentence embeddings. The final clustering respected expert annotations of 80 % F1-Score [55,56].

Step 4: Preliminary banks of items selection process
The most critical part of our proposed model is the sentence embedding process. The pre-trained Sentence-BERT model was used to produce semantically accurate embeddings (Fig. 2). To ensure the quality of evidence, the following was done: 4.1. First iteration: a small subset of mobility items was analyzed by the Sentence-BERT model using the ICF terms from the ontology as a guide. At this step, the automatic heuristic retained for filtering out the clusterings was the Silhouette score, due to the lack of automatically applicable human knowledge. The analysis yielded sentences that were correctly and incorrectly clustered. This information was used by the experts to create relations for sentence pairs that should or should not be clustered together. 4.2. Second iteration: the relation for sentence pairs that were extracted from the first iteration was used as a training example to fine-tune the sentence-BERT model. The automatic heuristic employed was the accuracy metric on the binary classification relations identified by the experts at the end of step 4.1. The resulted clusters from the second iteration were analyzed again by the experts who grouped hundreds of items by labelling them to an agreed-upon domain.

Search results
The search strategy yielded a total of 47 reviews that met the eligibility criteria and were included Fig. 2 The iterative improvement process for preliminary item bank process. The process began with an initial Sentence-BERT model and relied heavily on the ICF ontology to produce a good enough first clustering. At each step, a grid search was collected over a wide range of hyperparameter values and a best clustering was retained according to automatic heuristics and human evaluation. After each clustering, expert annotations were collected to improve the Sentence-BERT model and yield better clusterings. We report the F1 score of each clustering with respect to the first and second expert annotations, respectively named E_1 and E_2. Here, E_2 is the most reliable metric, as it associates items with adequate labels, while E_1 associates item pairs with whether or not they belong together. By nature, E_1 penalizes having a large number of clusters, as can be seen on the third clustering's score. Also note that both E_1 and E_2 are not exact metrics, as, for instance, the third clustering still required heavy finetuning by experts to yield a satisfying Core Outcome Set despite the near-perfect E_2 score. [27,. 246 copies of mobility measures were retrieved, and from these 474 mobility domains and 2109 mobility items were extracted. Figure 3 presents the PRISMA flow diagram, including the selection process and the reasons for exclusion. Table 1 shows the hyperparameter values of the retained clustering for steps 4.1 to 4.3. Initially, our best grouping according to Silhouette score and expert knowledge resulted in 26 clusters. The experts reviewed each cluster of items and only included relevant and clear items. Duplicates (n=267), ambiguous parallel items (n=97), and fewer than 2 words items (n=134) were removed, resulting in 1611 out of 2109 items. In addition, among the 1611 items, 245 (15%) items were considered as outliers, as they did not fit well enough within their cluster. Also, seven clusters were identified as outliers, as they included items labelled to more than one domain. Results from the 26 clusters showed that fifteen clusters had no outliers; six clusters contained 5% to10% outliers; and ten clusters contained > 10% outliers.

Fig. 3 PRISMA flow diagram
After extensive discussion, experts decided not to eliminate outliers which are not filtered by the algorithm, clusters labelled to more than one domain, and to manually reassign them to the fitted clusters. Additionally, five new clusters were generated from outliers not filtered by the algorithm. Overall, 602 (37%) of the items were reassigned in the fine-tuning process resulting in 24 preliminary comprehensive outcome set of mobility, namely: Upper Extremity Mobility, Emotional Functions, Balance, Motor Control, Self-care, Social Life and Relationship, Cognition, Walking, Postural Transition, Recreation and Leisure Activities, Activities of Daily Living, Physical Functioning, Communication, Work/Study, Climbing, Sensory Functions, General Health, Fatigue, Functional Independence, Pain, Alcohol and Drugs Use, Transportation, Sleeping, and Finances ( Fig. 4 and Table 2). Also, we define the comprehensive outcome set of mobility conceptually based on the ICF and Webber's frameworks in Table 3.

Discussion
In this study, we identified a comprehensive outcome set of mobility and developed preliminary banks of items of mobility, for use in evaluating mobility among individuals with ABI, using NLP. We supported that it is possible to use a variety of existing instruments of mobility to build preliminary banks of items with promising properties using NLP. Although the PROMIS physical functioning item bank was found to be unidimensional, Mobility was constructed to represent a sub-domain of physical functioning to be used among individuals with chronic illnesses [30,31,103]. This study identified 24 preliminary banks of items of mobility, which need to be used to evaluate each domain in a comprehensive outcome set of mobility among individuals with ABI.
Improved outcome measures can substantially enhance clinical research and make the research process more efficient. Clinical trials may require fewer subjects, and greater assurances may be given that the perspectives of the patient are included. The goal of this work was to construct comprehensive mobility tools. Previous studies have shown that better items obtained from large item banks for relevant and clear items that can be understood and are considered important to patients, with less floor and ceiling effects, standardised time frames, content, and response options to improve item structure and wording [26,32,33]. The identified banks of items are required for researchers and health care professionals to compile and compare common mobility outcomes and items from centre to centre or client to client, directly influencing the identification and implementation of best practices [104].
An understanding of the nature and severity of mobility among individuals with ABI is needed, in order to develop effective individualized treatment plans and to compare different interventions. This requires a comprehensive assessment of impairments, activity limitations, and participation restrictions. The intervention plan varies depending on the patients' personal context, goals, and the complex interplay of the factors that influence mobility [14,105]. This work provided a preliminary comprehensive outcome set of mobility from all possible sources, and mapped the constructs measured to the ICF. Results of this study will be used in future as part of an agreed-upon consensus of mobility COS, and the Delphi approach will be administered to achieve [106][107][108] expert consensus (i.e., clinicians and individuals with ABI and their caregivers), to examine mobility COS, to assess experts' views on importance, clarity, and relevance of the domains and items of mobility, to unify the language of measuring mobility among individuals with ABI, and standardise measures used across clinical sites and studies.
In the rehabilitation sciences, developing NLP algorithms that can extract and process knowledge from text documents at a level that is usable by experts in the domain requires several elements that can generally be associated with intelligence [37,38]. Throughout the experiments, it became clear that expert knowledge was the key factor in obtaining more accurate clustering. In the beginning, no expert knowledge was used and the best architecture artificially incorporated expert knowledge by requiring adding the ICF terms and to filter words in a sentence. The resulting clusters were also hard to evaluate automatically due to the poor quality of the pre-trained sentence-BERT embeddings for mobility-related tasks. The incorporation of expert knowledge gradually improved the quality of the resulting clusters. At the same time, the more information used allowed the sentence-BERT model to be further finetuned, gradually reducing the need to insert artificial knowledge in the procedure. Namely, on the final iteration, the best performing architecture did not filter words and did not require ICF terms. This shows that with iterations and fine-tuning of sentence embeddings, models improve in capturing the added expert knowledge. We note that our finetuning approach can be seen as an active learning finetuning process of a language model, as was already proposed for image caption classification for instance [109].
Step 2 was important in ensuring an item format that is consistent and coherent with the Sentence-BERT model's input requirements. We however note that, while most of the tasks were done manually in our study, step 2 could be done entirely automatically. Since the nature of the study is to leverage NLP to increase the efficiency in generating outcome set, we believe automating step 2 would be a straightforward and important task in future iterations.
The use of item response theory (IRT) and computerised adaptive testing (CAT) is important in our next steps to provide item hierarchy and calibrate the items on a linear scale, respectively [110,111]. IRT models Fig. 4 Identification of mobility Core Outcome Set and preliminary item banks from the third final Clustering. In the fine-tuning step, items were considered outliers when they did not match well enough with the cluster they were in (clustering inaccuracy). Re-assigned items are items that changed cluster between the Cleaned Clustering and the Final Product. Re-assigned items include outliers but also items that were part of a large cluster that was split to make smaller and more precise clusters incorporate both the characteristics of items and characteristics of individuals and calculate the probability of a positive response, to classify items for each person [35,112,113]. CAT is a specific kind of computer-based testing that asks questions extracted from larger pools of items covering a wider range of items difficulty to provide a more precise way to decrease questionnaire burden [35,112,113]. Moreover, IRT can quantitatively estimate the properties of each item and eliminate poor items to optimise the matching of items for each patient using CAT applications.

Lessons Learned
"Shared language is important in leading adaptive change. When people begin to use the same words with the same meaning, they communicate more effectively, minimize misunderstandings, and gain the sense of being on the same page, even while grappling with significant differences on the issues [114]" One of the barriers to implement a COS of mobility to use among individuals with ABI has been the lack of a comprehensive common language describing domains of mobility in the healthcare professions. This gap of a common language prevented the development of a classification system of representative knowledge (i.e., ontology) that would allow the experts to make decisions related to tailored intervention plans among individuals with ABI. We therefore began this robust methodology using NLP with the goal of establishing preliminary banks of items of mobility that could be mapped within the continuum of care.  Lessons learned from this work include: First, NLP techniques require human annotations to thrive, as the work clearly indicated that expert knowledge was the key factor in obtaining more accurate clustering. Second, some measures included irrelevant and ambiguous items and we were able to examine and eliminate them. Third, the provided banks of items of mobility considered other item banks not identified in the literature search such as PROMIS. Toward that end, final consensus on a COS and banks of items of mobility needs to incorporate input from all stakeholders. Such item banks will provide a solid foundation to develop a commonly used ontology to inform selection of mobility outcomes and classification of mobility terms in digital health solutions and electronic medical records.

Limitations
During the process of retrieving copies of measures, we faced some challenges related to some of technologybased and performance/clinicians measures. These challenges include: the difficulty of retrieving some technology-based measures such as actical, actigraph, motionlogger, goniometers, caltrac accelerometer, gyroscopes, magnetometer and sensewear pro 3 armband; the domains and items for some technology-based measures (such as Global Positioning System (GPS)); and for some Table 3 The comprehensive Core Outcome Set of mobility defined conceptually based on the International Classification of Functioning, Disability, and Health, and Webber's frameworks Cluster number and name Definition

Upper Extremity Mobility
Defined as the ability to reach or rise up an object from one place to another, and perform the coordinated actions of handling, picking up, manipulating and releasing objects using one's hand, fingers and thumb.

Emotional Functions
Defined as mental functions related to the feeling including depression, anxiety and anger 3.Balance Defined as the ability to maintain the body position within the base of support with minimal postural sway.

Motor Functions
Defined as functions associated with motor control and coordination of voluntary movements.

Self-care
Defined as the ability to caring for oneself, washing and drying oneself, dressing, eating and drinking, and looking after one's health.

Social life and Relationship
Defined as the ability to carrying out the actions and tasks required for basic and complex interactions with people in a contextually and socially appropriate manner to engage in organized social life in community, social and civic areas of life.

Cognition
Defined as specific functions of the brain including memory and executive functions.

Walking
Defined as the ability to move along from point A to point B including, walking short or long distances; walking on different surfaces; and walking around and over obstacles.

Postural Transition
Defined as the ability to move from one surface to another without changing body position such as moving from a bed to a chair.
10. Recreation and Leisure Activities Defined as the ability to engage in any form of play such as going to art galleries, museums, or cinemas for pleasure.

Activities of Daily Living
Defined as the ability to carrying out everyday actions and tasks including acquiring a place to live, preparing meals, household cleaning and repairing.

Physical Functioning
Defined as the ability to do various activities that require increasing degrees of strength and endurance.

Communication
Defined as specific features of communicating by speaking or carrying on conversations, comprehending and comprehension.
14. Work/Study Defined as the ability to engage in all aspects of work including seeking employment and getting a job, doing the required tasks or studies to get the job.

Climbing
Defined as the ability to move upwards or downwards over different surfaces such as climbing stairs 16. Sensory Functions Defined as functions of sense including vision, auditory, smell, touch and taste.

General Health
Defined as the status of complete physical, mental and social well-being.

Fatigue
Defined as functions related to respiratory and cardiovascular capacity for enduring physical exertion.

Functional Independence
Defined as the ability to perform an activity with no or little help from others.

Pain
Defined as an unpleasant feeling that indicates potential or actual damage to some body structure.
21. Alcohol and Drug use Defined as substances that are harmful use for the mental 22. Transportation Defined as using transportation to move around such as being driven in a car.

Sleeping
Defined as a characteristic physiological change accompanied by general mental functions of intermittent, reversible and selective physical and mental disengagement from one's immediate environment.

Finances
Defined as products, such as money which serve as an exchange for labour, goods and services.
performance/clinicians measures (such as gait speed, six minute walking test, timed up and go test, and manual functional test) were hard to extract. While our methodology improved overall performance of the model, we note the following limitations in relation to the automatic NLP evaluation: traditional clustering metrics like the Silhouette score are only barely useful when comparing two different groupings produced by our model due to the difficulty of interpreting sentence embeddings produced by neural networks. Also, the Silhouette score is not an accurate estimate to calibrate the items in the identified banks of items. Thus, the quality of our banks of items needs to be validated by expert knowledge to ensure that the emerged list of items covered the construct of mobility based on the ICF categories. Regarding the items, we have not accounted for the time frame and response options while analysing the clusters, as we only accounted for the content of the item. Finally, we note that, while our procedure was retained for its overall simplicity, other alternatives exist for sentence clustering. These alternatives are however out of scope of the current paper due to the large time consumption involved in evaluating another clustering by experts.

Conclusion
The comprehensive banks of items of mobility presented in this study has multiple uses: First, it represents a first step toward establishing a comprehensive COS and a common language of mobility among individuals with ABI to develop the ontology. Second, it enables researchers and healthcare professionals to begin exposing the content of mobility measures as a way to assess mobility comprehensively among individuals with ABI. Ultimately, using shared assessment items of mobility it may be possible to adapt these items across the continuum of care. Our banks of items of mobility will soon be used to develop the ontology, allowing the stakeholders to make decisions about tailored individualized treatment plans. Lastly, the promising results obtained in this study provide a road map for using NLP in other health outcome areas and we expect they will motivate future works in this direction to leverage alternative NLP techniques.